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ABSTRACT 

We report the nucleotide sequence of a gene encoding the constant region of a 
human immunoglobulin yl heavy chain (C J. A comparison of this sequence with 
those of the C 2 and C 4 genes reveals T that these three human C genes share 
considerable homology in iolh coding and noncoding regions. The nucleotide sequence 
differences indicate that these genes diverged from one another approximately 6-8 
million years ago. An examination of hinge exons shows that these coding regions 
have evolved more rapidly than any other areas of the C y genes in terms of both base 
substitution and deletion/insertion events. Coding sequence diversity also is observed 
in areas of domains which border the hinge. 

INTRODUCTION 

Immunoglobulin G (IgG) molecules in humans are divided into four subclasses 
based on the presence of particular gamma heavy chain constant regions (C^). These 
C y regions (C yV C yV C yV and C y4 ) are encoded by distinct germline genes (1) 
which are presumed to be the products of gene duplication of an ancestral C^ gene. 
Several species of mammals have been shown to possess IgG subclasses, although the 
number of subclasses varies for different species. For example, both humans and 
mice have four subclasses, while guinea pigs have two and rabbits have only a single 
type of IgG. Structural studies at the protein and DNA level have been carried out 
with several species, and have shown that the homology relationships within the C^ 
gene families are different for different mammals (2-9). For example, human C Y 
protein regions are over 9096 homologous (2-5), while mouse C^ genes share 
significantly less homology (70-80% at the nucleotide level (6-8)). Moreover, cross- 
species comparisons reveal no clear correspondence between individual human and 
mouse genes. These intra- and interspecies homology relationships, as well as the 
different numbers of C Y genes found in different mammals, indicate that the various 
mammalian C^ gene families have evolved quite differently since the time of 
mammalian speciation. 

We are interested in studying structural features of human C^ genes in order to 
gain insights into the evolution of the human C^ gene family. We have previously 



m 

CO 



O 

o 



D IRL Press Limited, Oxford, England. 
0305-1048/82/1013-407132.00/0 



407t 



Nucleic Acids Research 



characterized the C y2 and C y4 genes (10,11). In this paper we report the complete 
nucleotide sequence of a C Yl gene and compare the three human sequences. 

MATERIALS AND METHODS 
Mate:, l is 

The human fetal liver DNA library was obtained from T. Maniatis. Sources of 
nucleic acid enzymes, reagents for DNA sequencing, E. coli strain JM101, and the 
phage M13mp2 were those described by Steinmetz et al. (12). 
Isolation and restriction mapping of a human C ^ genomic clone 

Screening of a human fetal liver DNA library cloned in lambda Charon 4A 
bacteriophage with a human C^ 3 cDNA probe was done as previously described (10). 
Mapping of restriction sites for the enzymes Eco RI, Bam HI, Hindm, Xba I, Bgl fl, 
and Pvu U was done by analysis of single and double digests with these enzymes. 
Subcloning and DNA sequence analysis 

The 3.0 kb Hind M-Pvtt H fragment of clone HG3A (see Fig. 1) was digested 
separately with frequent-cutting restriction enzymes and the products were subcloned 
into the phage M13mp2 as described (11). Subclones were chosen for sequence 
analysis following screening of plaques with a labelled genomic fragment containing a 
full-length C y4 gene (see refs. 10 and 11). DNA sequencing oWmiividual subclones 
was carried out as described (11). The composite DNA sequence was determined 
either by overlaps of sequenced regions or by homology of the translated DNA 
sequence to existing sequence data for a human immunoglobulin yl protein (2). 

RESULTS AND DISCUSSION 

The primary structure of a human C. ^ gene 

We have previously described the isolation of human C^ genes from a 
recombinant phage library of fetal liver DNA, using as hybridization probe a cDNA 
encoding part of a C yZ gene (10). One of these clones, HG3A, is shown 
diagrammatically in Fig. 1. The restriction map of this clone indicated that it is a 
distinct species from the clones shown to contain C y2 and C^ 4 genes (10,11). A 
2.0 kb region from clone HG3A containing sequences hybridizing to a full-length C y4 
gene was sequenced by the dideoxynucleotide chain-termination method in the phage 
M13mp2. The sequence obtained is shown in Fig. 2, where we see that the gene has 
the same basic exon-intron organization that has been previously observed for both 
human (10,11) and mouse (6-8) C y genes. The three C H domains and the hinge 
segment of the polypeptide are encoded in individual exons that are separated from 
one another by introns, the largest one lying between the C H 1 and hinge exons. The 
predicted amino acid residues are listed above the corresponding codons in Fig. 2, and 
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Figure 1. Restriction map and sequencing strategy of a cloned human DNA fragment 
containing a C. gene. Letters on the top line refer to cleavage sites for the 
following restriction enzymes: B, Bam HI; H, Hind HI; Bg, Bglll; P, Pvu 0; X, Xb&l. 
Only the indicated Pvu n site was mapped, although this enzyme also cuts in other 
places in the clone. The arrow under the solid block indicates the direction of 
transcription. The dashed lines lead to an enlarged view of the region which was 
sequenced. Individual exons are shown here as solid blocks, whereas introns are not 
indicated at the top of the Figure. The arrowed lines represent the extent and 
direction of sequence determinations of individual subclones generated using the 
indicated enzymes. 

a comparison of this protein sequence with that of the heavy chains of the two human 
IgGl molecules Eu (2) and Nie (13) lead to an unambiguous designation of the cloned 
sequence as a C yl gene. Except for differences in amide assignments of several 
residues, the encoded protein sequence differs from the Eu sequence at just three of 
329 compared residues, and only one difference is seen in a comparison with the Nie 
heavy chain. These differences do not include the lysine encoded at the C-terminus 
of the C H 3 domain, which has been observed in mouse (6-8) and human (10,11) C y 
genes but does not appear in the mature polypeptides. Table 1 compares the lengths 
of the exons and introns of the human and mouse C y genes that have been sequenced 
to date. Although some variation is seen in the lengths of noncoding regions and 
hinge exons, the overall organization of the C Y genes is conserved in humans and 
mice. 

Antigenic determinants have been found on human IgG molecules which can 
serve as genetic markers for C H regions (14). Some of these allelic variants, called 
allotypes, have been correlated with specific amino acid residues in the heavy chains 
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ACCTTTCTGGG«AGGCC*GCKCrGAC^ 

ASrKGPSVFPL 

CCTCGCMACASfTiACMCCCiGGGMCTCT^^ 
.ftec^ercrr. rA*LG CLVKOYFPePVTVS«NSG*LTSGVM 

GG^a^C?CCTCc1agIcCaCC?C^GG^ 
C IcCMCCCGG€rGTCC>A^ 

C lcci*GGTGG*ci»6M»GTT6eTG4G*GGCC*GCU:*«^ 
CAAGGCiGGCCCCOrCTGCCTCTTCACCCK^^ 
C AGGCCCTGCACACAAAGGGGCAGGrGCT(^^ 
GACaCCTTCTCTCCTCCCAGATTCCAGTAACTCCCAATCTTCT^^ 

APELLGGPS 

CCACCTCAAGGCGGGaCaMTGCCCTaGaGTAGCCTGCATCCa^ 
^C^CCTCTTCCCCCCAAAACK^^ 

wcg^mag^gc^^ 

^GCAAGGTcfccI^^ 

GQPPEPavr TLPP5R0ELTKMQV3LTC 
TMCCTGAGAGTGACCKTGTACCAACCTCTGTCCTACAGGGCAGCCCCGAGAACCilCAGGTGTACACCCT^ 

^tggk^ttctatccc^ 

aGC a AGC T c acc g r GG AC a ag agc agg t ggc agc agggg a *cg tcttctcat gctccg t g a tgc a T 6 aggc tc tgc acaaccact acacgcagaagagcctc TCCC TGTC TCCGGGT AAA 

r Jt — i 

TGA6TGCG ACGGCCGGC A AGCCCCGC TCCCCGGCC TC TCGCG6TCGC ACG AGGAT6CT T6CC AC6T ACCCCCT6T ACAT ACTTCC CC GCCflCCT Afi T ATGGAAAT AAA6 C ACCC AftTGfi T 
GCCCTGGGCCCCTGCGAGACTGTGATGGTTCTTTCCACGGGTCAGGCCGAGTCTGAGGCCTGAGTGGCATGAGGGAGGCAGAfiCGGBTC 

Figure 2. The nucleotide sequence of a human C * gene ah<Wte^rreynding protein 
sequence. The sequence of the mRNA synonymous strand is listed 5» to 3. Amino 
acids predicted by the DNA sequences are listed in one-letter code above the 
respective codons. "Stop" indicates the termination codon UGA* The presumptive 
poly(A) addition signal sequence is marked by an asterisk. 

(15). We find that the discrepant residues in the Eu heavy chain and the encoded 
polypeptide reported here can be correlated with certain of these allotypic markers. 
The lysine encoded at position 97 of the C H 1 domain (Fig. 2) correlates with the Gm 
(17) determinant, while the arginine at the corresponding place in the Eu heavy chain 
is associated with the Gm (3) marker. Similarly, the asp-glu-leu sequence at positions 
16-18 of the C H 3 domain of the cloned gene are believed to represent the Gm (1) 
allotypic determinant, whereas the glu-slu-met present in Eu correlates with the Gm 
(non-1) variant. Thus the cloned gene reported here encodes a polypeptide with the 
genetic markers Gm (1,17). The Nie heavy chain also carries these markers, yet 
differs at amino acid number 41 of the C H 3 domain (Nie has arginine as compared to 
a tryptophan codon for the cloned sequence). 
Sequence divergence among three human genes 

We have previously reported the nucleotide sequences of genes encoding C H 
regions of human y2 and T 4 heavy chains (10,11). Our analysis of a C Tl gene allows a 
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Table 1 Intron and exon lengths in C genes 









length of gene segment (nucleotides) 










C„l-hinge 


hinge-C H 2 




C jj2~C pj3 






C y gene 


c H i 


intron 


hinge 


intron 


C H 2 


intron 


C H 3 


o' UT 


human yl 


294 


388 


45 


118 


330 


96 


321 


s 130 


human y2 


294 


392 


36 


118 


327 


97 


321 


sr 130 


human y4 


294 


390 


36 


118 


330 


97 


321 


^ 130 


mouse yl 


291 


356 


39 


98 


321 


121 


321 


93 


mouse y2a 


291 


310 


48 


107 


330 


112 


321 


103 


mouse y2b 


291 


316 


66 


107 


330 


112 


321 


103 



The data for the mouse genes are from reference 8. The human y2 and y4 numbers come 
from references 11 and 10, respectively. The lengths of the 3' untranslated (UT) regions 
in the human genes are determined by homology to the corresponding regions in mouse 
C genes (see Fig. 5 of reference 10). 



comparison of three members of the human gene family. A summary of the 
nucleotide sequence comparisons is shown in Table 2. Nucleotide differences in the 
various noncoding regions are similar, and so values are listed for the total divergence 
in noncoding DNA. Similarly, each of the C H exons show similar homologies among 
the three genes, and the total observed differences for these exons are given. Hinge 
exons, on the other hand, show much greater variation than any other gene segment, 
and these regions are separately compared. Table 2 shows that the level of 
nucleotide substitution (not including gaps) in noncoding areas is not much greater 
than the total (silent plus amino acid replacement) seen in the C H coding regions. 
Except for areas surrounding the site of polyadenylation of the mRNA (16) and splice 
junctions (17), the noncoding segments of these genes have no known function. If 
these sequences are without any function, they are presumably not subjected to 
natural selection and are free to diverge. Estimates of the rate of appearance of 
nucleotide substitutions in unselected noncoding DNA (18) lead us to conclude that 
approximately 6-8 million years have elapsed since any two of these genes shared an 
identical sequence. The similar homology levels seen in the three pairwise 
comparisons make it difficult to determine which two genes shared the most recent 
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Table 2 Nucleotide sequence comparisons of three 
human immunoglobulin C genes 







* 

% nucleotide difference 












C H exons 


Hinge exons 


genes 
compared 


total 
noncoding 
areas 


silent 


replacement 


silent 


replacement 


Yl vs. y2 


4.7 (14 gaps)* 


1.6 


1.9 


2.7 


11.1 


yl vs. y4 


5.4 (18 gaps) 


2.3 


2.2 


2.7 


16.7 


Y2 vs. y4 


4.S ( 4 gaps) 


2.0 


1.6 


3.3 


16.7 



* this is calculated as (number of substitutions/number of residues compares) x 100. 
Gaps were not compared. 

* These were introduced into one or another of the compared sequences to maintain 
the homology alignment. 

common ancestor. However, significantly fewer gaps need to be placed in the 
noncoding areas of the C^ 2 and C y4 genes to maintain the homology alignment of the 
two sequences. This observation along with the determineoViihkage of these genes 
(11) suggests that they diverged more recently from each other than from the 
gene. 

Coding sequence divergence in and near the hinge 

The most interesting areas of these genes in evolutionary terms are the hinge 
exons, which Table 2 indicates are the most divergent gene segments. The 
differences listed do not reflect the fact that the C y2 and C^ 4 hinge exons encode 
three fewer amino acids than the C t hinge exon, which codes for 15 residues. The 
DNA sequence alignment giving maximum homology among the three genes in this 
exon is shown in Fig. 3. Here we see that distinct nine-nucleotide gaps are placed in 
the C 2 and C 4 sequences. On either side of these gaps are small coding stretches 
which\re homologous in the three C y genes. Every nucleotide substitution indicated 
in the C 2 and C^ 4 sequences is in a triplet which encodes an amino acid unique to 
that hinge region/ The combination of nucleotide substitution and insertion/ deletion 
events leads to quite different coding properties in the hinge exons for the three 
genes. Fig. 4 shows the predicted amino acid sequences for the three hinge segments, 
as well as some contiguous residues in the C H 1 and C H 2 domains. The alignment 
shows that coding sequence diversity is not limited to the hinge exon itself, but is also 
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XI GAGCCCAAATCTTGTGACAAAACTCACACATGCCCACCGTGCCCA 
X2 G G T-G-G • 



74 T - 



-A-G — 



-C-C- 



-r-A- 



Figure 3. Comparison of hinge exon nucleotide sequences. Solid lines represent 
identity of the y2 and y4 sequences to the yl sequence. Where differences occur in 
the y2 and y4 exons, the relevant residues are listed. Gaps are introduced into the y2 
and y4 listings to maximize homology to the yl sequence. 

found in areas of the C H domains which are adjacent to the hinge. Again both base 
substitution and insertion/deletion events produce coding differences; the latter type 
of event leads to nucleotides in the C H 2 exon of the C y 2 gene being read in a 
different translational reading frame than their homologous counterparts in the other 
two genes (see Fig. 2 of ref. 11). Thus although the three genes encode polypeptides 
which are at least 95% identical over most of their length, amino acid substitutions 
are clustered in the hinge areas of the proteins. We believe that the high level of 
divergence in this region exists because natural selection favors the generation of 
diversity in this part of the molecule. This is not to say that the rate of nucleotide 
substitution is greater in the hinge than in the more conserved noncoding regions, but 
rather that substitutions in the hinge area are more rapidly fixed by selection. The 
nature of the selective advantage offered by hinge variation is not obvious, although 
it has been suggested that divergent hinges may be responsible for the differences in 
effector functions carried out by IgG subclasses (3,19,20). If this view is correct, 
then the generation of new and diverse effector functions may be the selective force 
which fixes nucleotide changes in the hinge area and the hinge exon itself. 



XI 
X2 
Yd 



-R- 



hkpsntkvokkvEpkscdkthtcppcp|apellggpsvflfppkpkotlmisrt 



•S-YG PP— S- 



PVA ■ 

-f — 



C H 1 



HINGE 



C H 2 



Figure 4. Comparison of amino acid residues in the hinge area of three C 
polypeptides. Vertical lines separate the hinge residues from those contiguous amino 
acids which are encoded in the C H 1 and C H 2 exons. Amino acids are listed in the 
one-letter code. Solid lines represent identity of the yl and y4 sequences to the yl 
sequence. The C„2 domain of the C 2 sequence contains one less amino acid than is 
found in the othe"genes. T 
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An unresolved evolutionary issue 

Our current picture of human genes is that they have diverged recently 
from one another, and that hinge regions have evolved rapidly since that divergence. 
What is not clear is the nature of the genetic event(s) giving rise to the identical 
genes which were the ancestors of the present-day genes. There are two likely 
alternatives for the generation of two or more identical sequences: (1) a duplication 
ox a single gene sequence, thus producing a gene de novo , and (2) a gene correction 
process (21) in which all or part of the sequence of one gene is replaced by the 
sequence of a nonallelic but homologous gene. The latter explanation implies that 
members of a multigene family do not evolve independently of one another, but 
rather that genetic information can be exchanged between nonallelic members of a 
gene cluster. Molecular evidence for the occurrence of such events has been cited 
for human (22,23) and mouse (24) globin genes and for mouse immunoglobin genes (8). 
Such evidence consists of the finding of a presumed recombination breakpoint which 
separates areas of a gene which either were or were not involved in a genetic 
exchange with another member of the gene family. This breakpoint defines a 
relatively sharp boundary on either side of which two nonallelic genes share different 
levels of homology. A boundary of this kind is not found in a comparison of the three 
human genes, since except for the extensive divergence found in the hinge region, 
the nucleotide differences are distributed rather evenly over the length of the genes. 
If evidence exists for recombination between any two of these nonallelic genes, it is 
most likely to be found in regions flanking the coding areas that we have 
characterized. 

Thus we are unable to distinguish between the above two alternatives, although 
we have argued (11) that gene duplication and gene correction are not mutually 
exclusive concepts. The same kinds of fundamental genetic processes that result in 
gene duplication can also bring about gene correction. We think it likely that these 
genetic processes have continued to act on human genes since the occurrence of 
the initial duplication event(s). According to this view, our estimated time of 
divergence of human genes represents the time elapsed since the most recent 
correction event. Thus we believe that the human gene family is probably much 
older than indicated by the extensive homology shared by its members. 
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